Primary exercises

  1. Create and investigate a list.
    Three students received different sets of grades (Amy: 1,6,7,9,10; Bob: 6,7,4,3,5,2,2,1,4; Dan: 9,9,10).
    In a variable scores create a list (the names of the list elements should be the names of the students and the values should be the corresponding grades).
    Print the list, its class, length and structure (str) of scores.
scores <- list(
  Amy = c( 1,6,7,9,10 ),
  Bob = c( 6,7,4,3,5,2,2,1,4 ),
  Dan = c( 9,9,10 )
)
scores
$Amy
[1]  1  6  7  9 10

$Bob
[1] 6 7 4 3 5 2 2 1 4

$Dan
[1]  9  9 10
class( scores )
[1] "list"
length( scores )
[1] 3
str( scores )
List of 3
 $ Amy: num [1:5] 1 6 7 9 10
 $ Bob: num [1:9] 6 7 4 3 5 2 2 1 4
 $ Dan: num [1:3] 9 9 10
  1. Add an element, change an element.
    Reuse scores from the previous exercise.
    Add there grades for Eve (7,3,5,8,8,9) and print the list.
    Then, for Dan merge new grades (8,8,6,7) with the existing grades (hint: use the combine function c to combine existing Dan’s grades with the new grades then put the result back to scores; do not type again 9,9,10).
scores[[ 'Eve' ]] <- c(7,3,5,8,8,9)
scores
$Amy
[1]  1  6  7  9 10

$Bob
[1] 6 7 4 3 5 2 2 1 4

$Dan
[1]  9  9 10

$Eve
[1] 7 3 5 8 8 9
scores[[ "Dan" ]] <- c( scores[[ "Dan" ]], c(8,8,6,7) )
scores
$Amy
[1]  1  6  7  9 10

$Bob
[1] 6 7 4 3 5 2 2 1 4

$Dan
[1]  9  9 10  8  8  6  7

$Eve
[1] 7 3 5 8 8 9
  1. Single and double bracket operators.
    Reuse scores from the previous exercises.
    Investigate the difference between scores[[ "Bob" ]] and scores[ "Bob" ].
    Look at what is printed and what is the class of each result.
    Then compare scores[[ c( "Amy", "Bob" ) ]] with scores[ c( "Amy", "Bob" ) ].
    Understand, why the error is reported.
scores[[ "Bob" ]]             # Returns the value of Bob element (vector)
[1] 6 7 4 3 5 2 2 1 4
scores[ "Bob" ]               # Creates a new list with only Bob there (list)
$Bob
[1] 6 7 4 3 5 2 2 1 4
class( scores[[ "Bob" ]] )
[1] "numeric"
class( scores[ "Bob" ] )
[1] "list"
scores[[ c( "Amy", "Bob" ) ]] # A list is needed to return two elements
Error in scores[[c("Amy", "Bob")]]: subscript out of bounds
scores[ c( "Amy", "Bob" ) ]   # This creates a list, so many elements are ok
$Amy
[1]  1  6  7  9 10

$Bob
[1] 6 7 4 3 5 2 2 1 4
  1. Dollar operator.
    Reuse scores from the previous exercises.
    Investigate the (lack of) difference between scores$Bob and scores[[ "Bob" ]].
    Look at what is printed and what is the class of each result.
    Then compare scores$Bo with scores[[ "Bo" ]].
    Understand, why the NULL is returned.
scores$Bob        # another way to access Bob
[1] 6 7 4 3 5 2 2 1 4
scores[[ "Bob" ]] # get an element with exact name Bob
[1] 6 7 4 3 5 2 2 1 4
class( scores$Bob )
[1] "numeric"
class( scores[[ "Bob" ]] )
[1] "numeric"
scores$Bo         # strange matching of names, it still finds Bob
[1] 6 7 4 3 5 2 2 1 4
scores[[ "Bo" ]]  # there is no "Bo" so NULL is returned
NULL

Extra exercises

  1. A list returned by a function; test for association/correlation.
    For this exercise we need two random numerical vectors.
    Let’s create x and y, each of 30 elements sampled from the normal distribution: x <- rnorm( 30 ) and y <- rnorm( 30 ).
    Print these vectors. You may also produce a scatter plot: plot( x, y ).

    The function cor.test tests for association between corresponding elements of two vectors.
    Use h <- cor.test( x, y ) and print h to see a report of the association test.
    Internally h is stored as a list. Print names of the elements stored in h.
    Now, read Help for cor.test. In the section Value you will see the description of the h elements.
    Get directy the values of elements estimate and p.value.
x <- rnorm( 30 )
y <- rnorm( 30 )
x
 [1]  0.30224512 -0.76412523 -0.36448813 -0.83197398 -0.39720935  0.98171964
 [7] -0.88489484 -0.40106485 -0.90992648 -0.72197350 -1.29670559  0.98123329
[13]  1.07306854  0.45327181  0.20107139 -1.22899661  0.21656891  0.37933133
[19]  0.66863952  0.75020644  0.84037902 -0.83746753 -0.60576637  0.42884964
[25]  0.20984171 -2.56704390 -0.04686572 -0.35520373  0.95793973  1.03854653
y
 [1]  0.97461619  0.27001414 -0.27148205  0.48083874  0.15700707  0.72397354
 [7] -0.14040680 -0.03102483 -0.03932625  0.30306325  1.03404769  1.10035770
[13] -0.98739543 -0.17006043  0.75054006 -1.46963522  0.23329117 -1.09366987
[19] -0.68887332 -2.15925851  1.18961161 -0.45360931 -1.16454263  1.49702588
[25]  0.53555637  0.88893491  0.98099870  0.69283695  0.96069868  1.75198049
plot( x, y )

h <- cor.test( x, y )
h

    Pearson's product-moment correlation

data:  x and y
t = 0.46131, df = 28, p-value = 0.6481
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.2822500  0.4335541
sample estimates:
       cor 
0.08685067 
names( h )
[1] "statistic"   "parameter"   "p.value"     "estimate"    "null.value" 
[6] "alternative" "method"      "data.name"   "conf.int"   
h[[ 'estimate' ]]
       cor 
0.08685067 
h[[ 'p.value' ]]
[1] 0.6481373
  1. A nested list.
    Let’s extend the concept of scores to describe various topics (see the code below).
    Check class and str of scores.
    Calculate how many students are in the scores list.
    Get Dan’s scores in physics.
scores <- list(
  Amy = list(
    math = c( 1,6,7,9,10 ),
    biology = c( 7,6,8 )
  ),
  Bob = list(
    math = c( 6,7,4,3,5,2,2,1,4 ),
    physics = c( 8,7 )
  ),
  Dan = list(
    math = c( 9,9,10 ),
    physics = c( 10, 10, 10 ),
    biology = c( 3, 5, 7 )
  )
)
class( scores )
[1] "list"
str( scores )
List of 3
 $ Amy:List of 2
  ..$ math   : num [1:5] 1 6 7 9 10
  ..$ biology: num [1:3] 7 6 8
 $ Bob:List of 2
  ..$ math   : num [1:9] 6 7 4 3 5 2 2 1 4
  ..$ physics: num [1:2] 8 7
 $ Dan:List of 3
  ..$ math   : num [1:3] 9 9 10
  ..$ physics: num [1:3] 10 10 10
  ..$ biology: num [1:3] 3 5 7
length( scores )      # number of students
[1] 3
length( scores$Bob )  # number of topics for which Bob has scores
[1] 2
scores[[ "Dan" ]][[ "physics" ]]
[1] 10 10 10
scores$Dan$physics
[1] 10 10 10
scores$Dan[[ "physics" ]]
[1] 10 10 10

Multitopic exercises

  1. (ADV) Split a table into list of tables by a column factor; merge back.
    Some functions might require an input to be provided as a list of tables.
    Let’s assume that the pulse table should be split into a list of table parts based on the exercise argument.
    Load the pulse.csv data to variable pulse.
    Try l <- pulse %>% split( .$exercise ) and investigate the class, length and names of the result l.
    Use double square bracket to extract the part for exercise being low.
    Finally, check that with bind_rows applied to l you can recreate the pulse table (but with a different order of rows).
l <- pulse %>% split( .$exercise )  # . represents the object on the left side of %>%
class( l )
[1] "list"
length( l )
[1] 3
names( l )
[1] "high"     "low"      "moderate"
l[[ "low" ]]
# A tibble: 37 × 13
   id     name  height weight   age gender smokes alcohol exerc…¹ ran   pulse1 pulse2
   <chr>  <chr>  <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>   <chr>  <dbl>  <dbl>
 1 1993_E Lauri    173     64    18 female no     yes     low     sat       90     88
 2 1993_F Geor…    184     74    22 male   no     yes     low     ran       78    141
 3 1993_L Fred…    178     58    19 male   no     no      low     sat       74     76
 4 1993_P Math…    185    110    22 male   no     yes     low     sat       77     73
 5 1993_Q Lesl…    170     56    19 male   no     no      low     sat       64     63
 6 1993_U Jero…    175     60    19 male   no     no      low     sat       88     86
 7 1993_V Arle…    140     50    34 female no     no      low     ran       70     98
 8 1993_W Glen…    163     55    20 female no     no      low     sat       78     74
 9 1995_B Olga     172     60    21 female no     no      low     sat       81     79
10 1995_H Eliza    164     66    23 female no     no      low     ran       74    168
# … with 27 more rows, 1 more variable: year <dbl>, and abbreviated variable name
#   ¹​exercise
recreatedPulse <- bind_rows( l )
dim( pulse )
[1] 110  13
dim( recreatedPulse )
[1] 110  13
  1. (ADV) Split a table by a column and write each part to a different file.
    Continue with the setup of the previous exercise.
    Study/type/exectute the following example.
    Find the newly created files in your filesystem.
l <- pulse %>% split( .$exercise )
exercises <- names( l )                   # name in l of each table chunk
for( exercise in exercises ) {            # exercise will be a name of a single chunk
  fileName <- paste0( "pulse_", exercise, ".csv" )  # name of the file for the chunk
  message( "Writing file '", fileName, "'..." )
  write_csv( l[[ exercise ]], file = fileName )
}


Copyright © 2023 Biomedical Data Sciences (BDS) | LUMC